-
Notifications
You must be signed in to change notification settings - Fork 261
Fix firstOnly selection behavior #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`firstOnly` used to select one match per combination (object+chain+segi), so if one object had multiple chains, each chain would match once. This makes it so that `firstOnly` will only match one time per object, on the first segi+chain available (in alphabetical order).
Another issue I've noticed is that for objects which have residue sequence data available, but the residues don't have structural information (like in loops), this script is unable to find the sequence to select. But I'm not sure how to update the selection behavior to fix it. |
I'll check the And about your missing residue issue, I have an idea that seems to work. For sure the sequence data is available in RCSB PDB and mmCIF files but may be missing when they are from other sources, I don't know when it is the case. The API only |
The Edit: I checked some values in |
I reverted your commits as I tested it a few hours ago and they weren't working. I shouldn't had merged it without testing. You can recover your commits by this PR branch, if you need. Take your time... |
Remove dependency on hardcoded ONE_LETTER dictionary
It worked as expected on my files, but I'd need your files and tests to check if it is working as expected. |
Note that the residues without structural information can be retrieved only if using a file with appropriate metadata. In custom made files it will skip missing residues, I guess. If pertinent, it should be handled by code or explicitly stated at documentation. |
Do you have one of these so I can test the code? |
I tested your branch |
I don't know how to force push, so I edited the conflict thing. Another thing that I don't know if we should document on the help function is that this code (and the original version as well) only search for non-overlapping matches. In a future update, how about we add the option to create a group and put each match into its own selection, which would allow for overlapping matches? |
I don't understand this regex in the examples. It is multi-character? How it works? # find the Potential N-linked glycosylation sites in 5fyj
fetch 5fyj
findseq N(?=[^P][ST]), 5fyj and chain G+B, 5fyj_pngs |
It looks for a single character. This regex is using a lookahead assertion to match only From wikipedia:
|
firstOnly
used to select one match per combination (object+chain+segi), so if one object had multiple chains, each chain would match once.This makes it so that
firstOnly
will only match one time per object, on the first segi+chain available (in alphabetical order).